Goto

Collaborating Authors

 analysis feature


Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Neural Information Processing Systems

We present a neural analysis and synthesis (NANSY) framework that can manipulate the voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features for controllable synthesis, which usually results in poor reconstruction quality. We address this issue by proposing a novel training strategy based on information perturbation. The idea is to perturb information in the original input signal (e.g., formant, pitch, and frequency response), thereby letting synthesis networks selectively take essential attributes to reconstruct the input signal. Because NANSY does not need any bottleneck structures, it enjoys both high reconstruction quality and controllability. Furthermore, NANSY does not require any labels associated with speech data such as text and speaker information, but rather uses a new set of analysis features, i.e., wav2vec feature and newly proposed pitch feature, Yingram, which allows for fully self-supervised training. Taking advantage of fully self-supervised training, NANSY can be easily extended to a multilingual setting by simply training it with a multilingual dataset. The experiments show that NANSY can achieve significant improvement in performance in several applications such as zero-shot voice conversion, pitch shift, and time-scale modification.



Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Neural Information Processing Systems

We present a neural analysis and synthesis (NANSY) framework that can manipulate the voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features for controllable synthesis, which usually results in poor reconstruction quality. We address this issue by proposing a novel training strategy based on information perturbation. The idea is to perturb information in the original input signal (e.g., formant, pitch, and frequency response), thereby letting synthesis networks selectively take essential attributes to reconstruct the input signal. Because NANSY does not need any bottleneck structures, it enjoys both high reconstruction quality and controllability.


Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Choi, Hyeong-Seok, Lee, Juheon, Kim, Wansoo, Lee, Jie Hwan, Heo, Hoon, Lee, Kyogu

arXiv.org Artificial Intelligence

We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features for controllable synthesis, which usually results in poor reconstruction quality. We address this issue by proposing a novel training strategy based on information perturbation. The idea is to perturb information in the original input signal (e.g., formant, pitch, and frequency response), thereby letting synthesis networks selectively take essential attributes to reconstruct the input signal. Because NANSY does not need any bottleneck structures, it enjoys both high reconstruction quality and controllability. Furthermore, NANSY does not require any labels associated with speech data such as text and speaker information, but rather uses a new set of analysis features, i.e., wav2vec feature and newly proposed pitch feature, Yingram, which allows for fully self-supervised training. Taking advantage of fully self-supervised training, NANSY can be easily extended to a multilingual setting by simply training it with a multilingual dataset. The experiments show that NANSY can achieve significant improvement in performance in several applications such as zero-shot voice conversion, pitch shift, and time-scale modification.


Top ten fintech analysis features in 2018

#artificialintelligence

Banks in Singapore take the lead on artificial intelligence (AI) training. Open banking: what you need to know By Christoffer O. Hernæs, chief digital officer, S-Banken Open banking will fundamentally change banking the same way internet banking once did. As banks become integrated parts of digital ecosystems, the distribution of banking products will change and in the end become more valuable in the right context for the end customer. How artificial intelligence (AI), embedded tech and experience design are reframing banking. The world is changing around big financial services organisations.


Perspica Uses Machine Learning for Root Cause Analysis

#artificialintelligence

Perspica added a new root cause analysis feature to its software-as-a-service (SaaS) visibility platform. The company says its root cause analysis detects issues within layers of the application stack and recommends potential fixes. Powered by machine learning, Perspica's root cause analysis feature is able to understand what is supposed to be happening in an application's performance, based on times of the day and days of the week, says JF Huard founder and CTO of Perspica. It is then able to analyze these metrics, detect anomalies, and determine the root cause. "Not only are we eliminating false positives, but we are also able to catch relevant problems by comparing an application's topology from day-to-day," Huard says.